Concurrent reinforcement learning as a rehearsal for decentralized planning under uncertainty
نویسندگان
چکیده
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Reinforcement learning (RL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. This assumption may not always be necessary and may make learning difficult. We propose a novel RL approach in which agents rehearse with information that will not be available during policy execution, yet learn policies that do not explicitly rely on this information. We show experimentally that incorporating such information can ease the difficulties faced by non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems.
منابع مشابه
Rehearsal Based Multi-agent Reinforcment Learning of Decentralized Plans
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Reinforcement learning (RL) based approaches have been recently proposed for distributed solution of Dec-POMDPs ...
متن کاملReinforcement Learning for Decentralized Planning Under Uncertainty (Doctoral Consortium)
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. But in real world scenarios, model parameters may not be known a priori, or may be difficult to specify. We prop...
متن کاملOptimizing decentralized production–distribution planning problem in a multi-period supply chain network under uncertainty
Decentralized supply chain management is found to be significantly relevant in today’s competitive markets. Production and distribution planning is posed as an important optimization problem in supply chain networks. Here, we propose a multi-period decentralized supply chain network model with uncertainty. The imprecision related to uncertain parameters like demand and price of the final produc...
متن کاملDecentralized Planning for Self-Adaptation in Multi-cloud Environment
The runtime management of Internet of Things (IoT) oriented applications deployed in multi-clouds is a complex issue due to the highly heterogeneous and dynamic execution environment. To effectively cope with such an environment, the cross-layer and multi-cloud effects should be taken into account and a decentralized self-adaptation is a promising solution to maintain and evolve the application...
متن کاملLarge-Scale Planning Under Uncertainty: A Survey
Our research area is planning under uncertainty, that is, making sequences of decisions in the face of imperfect information. We are particularly concerned with developing planning algorithms that perform well in large, real-world domains. This paper is a brief introduction to this area of research, which draws upon results from operations research (Markov decision processes), machine learning ...
متن کامل